Fitting GP to lightcurve 428_…in Stan

Notebook outlining the fitting of GP to thunderKAT lightcurve ID$ 428_…

Basic Model

  • Zero constant mean function
  • Homoskedastic noise
  • Not using data on error in observations
  • Wide prior on observational errors

\[y \sim \mathcal{N}(f(x), \sigma_\textrm{noise}^2)\]

\[f \sim \mathcal{GP}(\boldsymbol{0}, k(x, x'))\]

\[k(x,x') = \eta^2 \exp\left\{ -\frac{1}{2}\frac{(x - x')^2}{\ell^2}\right\}\]

\[\ell \sim \mathrm{InvGamma}(5,5)\]

\[\eta \sim \mathcal{N}^+(0,1)\]

\[\sigma_\textrm{noise} \sim \mathcal{N}^+(0,1)\]

MCMC Results

 variable       mean     median        sd       mad        q5        q95
    eta     0.184588   0.152648  0.113476  0.065523  0.083235   0.394774
    ell   150.580662 132.664000 78.972809 60.348195 63.226605 295.029000
    sigma   0.006285   0.006116  0.001137  0.001059  0.004713   0.008371
     rhat ess_bulk ess_tail
 0.999966     2961     2328
 1.001321     2922     2491
 1.000792     3212     2674

Posterior Predictive Samples

The fitted model has a very long lengthscale, comparable to the length of the observational window. The estimated observational noise has a standard deviation more than an order of magnitude of that in the original data. The combination of these parameters has lead to a very smooth fit that passes through the middle of the observed data points rather than through any datapoints themselves.

Observational Errors Model

  • Zero constant mean function
  • Include data on error in observations of \(y_i\)
  • Heteroskedastic (Gaussian) noise

\[y_i \sim \mathcal{N}(f(x_i), \sigma_i^2)\]

\[f \sim \mathcal{GP}(\boldsymbol{0}, k(x, x'))\]

\[k(x,x') = \eta^2 \exp\left\{ -\frac{1}{2}\frac{(x - x')^2}{\ell^2}\right\}\]

\[\ell \sim \mathrm{InvGamma}(5,5)\]

\[\eta \sim \mathcal{N}^+(0,1)\]

\[\sigma_i \sim \mathcal{N}^+(\textrm{stderr}(y_i), \mathrm{Var}(\textrm{stderr}(\boldsymbol{y})))\]

MCMC Results

 variable      mean    median       sd      mad        q5       q95     rhat
 eta       0.139203  0.135184 0.027459 0.025415  0.102757  0.189866 0.999570
 ell      11.188619 11.204300 0.493513 0.472801 10.334480 11.964810 1.001436
 sigma[1]  0.000393  0.000393 0.000042 0.000042  0.000326  0.000461 1.004134
 ess_bulk ess_tail
     6179     3134
     5518     3015
     8251     2818

Posterior Predictive Samples

By including the observed observational errors for setting priors on the Gaussian noise of each observation, the fitted median passes through each of the observed points.

Constant (non-zero) Mean function

  • Constant mean function
  • Prior on mean function value
  • Include data on error in observations of \(y_i\)
  • Heteroskedastic (Gaussian) noise

\[y_i \sim \mathcal{N}(f(x_i), \sigma_i^2)\]

\[f \sim \mathcal{GP}(C, k(x, x'))\]

\[C \sim \mathcal{U}[0,1]\]

\[k(x,x') = \eta^2 \exp\left\{ -\frac{1}{2}\frac{(x - x')^2}{\ell^2}\right\}\]

\[\ell \sim \mathrm{InvGamma}(5,5)\]

\[\eta \sim \mathcal{N}^+(0,1)\]

\[\sigma_i \sim \mathcal{N}^+(\textrm{stderr}(y_i), \mathrm{Var}(\textrm{stderr}(\boldsymbol{y})))\]

MCMC Results

 variable     mean   median       sd      mad       q5      q95     rhat
 eta      0.007869 0.007660 0.001386 0.001252 0.005995 0.010521 1.000645
 ell      1.562706 1.141115 1.230891 0.583181 0.557737 4.617188 1.001045
 C        0.190356 0.190383 0.001761 0.001681 0.187439 0.193192 1.001547
 sigma[1] 0.000393 0.000392 0.000041 0.000041 0.000324 0.000461 1.000504
 ess_bulk ess_tail
     3322     1919
     2440     1297
     3393     1961
     4224     2856

Posterior Predictive Samples

Fixed constant (non-zero) Mean function

  • Constant mean function set at fixed value, e.g., mean of observations
  • Include data on error in observations of \(y_i\)
  • Heteroskedastic (Gaussian) noise

\[y_i \sim \mathcal{N}(f(x_i), \sigma_i^2)\]

\[f \sim \mathcal{GP}(C, k(x, x'))\]

\[k(x,x') = \eta^2 \exp\left\{ -\frac{1}{2}\frac{(x - x')^2}{\ell^2}\right\}\]

\[\ell \sim \mathrm{InvGamma}(5,5)\]

\[\eta \sim \mathcal{N}^+(0,1)\]

\[\sigma_i \sim \mathcal{N}^+(\textrm{stderr}(y_i), \mathrm{Var}(\textrm{stderr}(\boldsymbol{y})))\]

Mean Function = 0.2

Mean Function = 0.195

Mean Function = 0.18

Matern 3/2 kernel, zero mean

\[y_i \sim \mathcal{N}(f(x_i), \sigma_i^2)\]

\[f \sim \mathcal{GP}(\boldsymbol{0}, k(x, x'))\]

\[k(x,x') = \eta^2 \left( 1 + \frac{\sqrt{3(x - x')^2}}{\ell}\right) \exp\left\{ -\frac{\sqrt{3(x - x')^2}}{\ell}\right\}\]

\[\ell \sim \mathrm{InvGamma}(5,5)\]

\[\eta \sim \mathcal{N}^+(0,1)\]

\[\sigma_i \sim \mathcal{N}^+(\textrm{stderr}(y_i), \mathrm{Var}(\textrm{stderr}(\boldsymbol{y})))\]

MCMC Results

 variable      mean    median        sd       mad        q5        q95     rhat
 eta       0.154295  0.138354  0.064650  0.043151  0.090575   0.273144 1.000380
 ell      77.002830 72.413850 22.854942 18.158514 49.429170 120.076800 1.000103
 sigma[1]  0.000393  0.000393  0.000041  0.000041  0.000327   0.000460 1.000717
 ess_bulk ess_tail
     4108     2402
     3854     2180
     7848     2658

Posterior Predictive Samples

SE + Matern 3/2 additive kernel

\[y_i \sim \mathcal{N}(f(x_i), \sigma_i^2)\]

\[f \sim \mathcal{GP}(\boldsymbol{0}, k(x, x'))\]

\[k(x,x') = \eta^2 \left[ \exp\left\{ -\frac{1}{2}\frac{(x - x')^2}{\ell_\mathrm{SE}^2}\right\} + \left( 1 + \frac{\sqrt{3(x - x')^2}}{\ell_\mathrm{M}}\right) \exp\left\{ -\frac{\sqrt{3(x - x')^2}}{\ell_\mathrm{M}}\right\} \right]\]

\[\ell_\mathrm{SE} \sim \mathrm{InvGamma}(5,5)\]

\[\ell_\mathrm{M} \sim \mathrm{InvGamma}(5,5)\]

\[\eta \sim \mathcal{N}^+(0,1)\]

\[\sigma_i \sim \mathcal{N}^+(\textrm{stderr}(y_i), \mathrm{Var}(\textrm{stderr}(\boldsymbol{y})))\]

 variable      mean    median        sd       mad        q5       q95     rhat
 eta       0.010996  0.008687  0.008222  0.004604  0.003972  0.025317 1.000166
 ell_SE   40.137106 34.952250 21.529021 13.219603 19.576910 75.923105 1.000316
 ell_M    54.647213 52.715750 14.041654 12.820265 35.660370 80.404210 1.000264
 sigma[1]  0.000394  0.000394  0.000041  0.000039  0.000327  0.000461 0.999895
 ess_bulk ess_tail
     3901     2446
     4500     2262
     3855     2547
     6845     2694

Posterior Predictive Samples

SE x Matern 3/2 multiplicative kernel

\[y_i \sim \mathcal{N}(f(x_i), \sigma_i^2)\]

\[f \sim \mathcal{GP}(\boldsymbol{0}, k(x, x'))\]

\[k(x,x') = \eta^2 \exp\left\{ -\frac{1}{2}\frac{(x - x')^2}{\ell_\mathrm{SE}^2}\right\}\left( 1 + \frac{\sqrt{3(x - x')^2}}{\ell_\mathrm{M}}\right) \exp\left\{ -\frac{\sqrt{3(x - x')^2}}{\ell_\mathrm{M}}\right\}\]

\[\ell_\mathrm{SE} \sim \mathrm{InvGamma}(5,5)\]

\[\ell_\mathrm{M} \sim \mathrm{InvGamma}(5,5)\]

\[\eta \sim \mathcal{N}^+(0,1)\]

\[\sigma_i \sim \mathcal{N}^+(\textrm{stderr}(y_i), \mathrm{Var}(\textrm{stderr}(\boldsymbol{y})))\]

MCMC Results

  variable      mean   median        sd      mad       q5       q95     rhat
 eta        0.037945 0.035572  0.023139 0.017537 0.011372  0.072392 1.170342
 ell_SE    21.972784 1.705505 29.944829 1.558493 0.592920 77.967890 1.619411
 ell_M     30.591427 1.660200 38.765222 1.471168 0.601296 97.911300 1.635714
 sigma[1]   0.000393 0.000393  0.000041 0.000042 0.000323  0.000461 1.001599
 f_star[1]  0.183928 0.183926  0.000398 0.000388 0.183284  0.184594 1.001590
 ess_bulk ess_tail
       16       65
        6       78
        6       71
     5349     2490
     3628     3852

Posterior Predictive Samples

Matern 3/2 + QP kernel

\[y_i \sim \mathcal{N}(f(x_i), \sigma_i^2)\]

\[f \sim \mathcal{GP}(\boldsymbol{0}, k(x, x'))\]

\[k(x,x') = \eta^2 \left[ \exp\left\{ -\frac{2 \sin^2\left( \pi\frac{\sqrt{(x - x')^2}}{T}\right)}{\ell_\mathrm{P}^2}\right\} + \exp\left\{ -\frac{1}{2}\frac{(x - x')^2}{\ell_\mathrm{SE}^2}\right\} + \left( 1 + \frac{\sqrt{3(x - x')^2}}{\ell_\mathrm{M}}\right) \exp\left\{ -\frac{\sqrt{3(x - x')^2}}{\ell_\mathrm{M}}\right\} \right]\]

\[\ell_\mathrm{P} \sim \mathrm{InvGamma}(5,5)\]

\[\ell_\mathrm{SE} \sim \mathrm{InvGamma}(5,5)\]

\[\ell_\mathrm{M} \sim \mathrm{InvGamma}(5,5)\]

\[\eta \sim \mathcal{N}^+(0,1)\]

\[T \sim U(\textrm{min_gap}(\boldsymbol{x}), \textrm{range}(\boldsymbol{x}))\]

\[\sigma_i \sim \mathcal{N}^+(\textrm{stderr}(y_i), \mathrm{Var}(\textrm{stderr}(\boldsymbol{y})))\]

 variable     mean   median      sd     mad      q5      q95   rhat ess_bulk
   eta      0.0040   0.0031  0.0035  0.0015  0.0015   0.0083 1.0707       37
   ell_SE  23.6680  22.3852 15.9193 10.2359  0.9128  47.8225 1.2686       10
   ell_M   32.3308  34.6120 16.2567 10.8554  0.9071  54.1578 1.2838       10
   ell_P    2.5511   2.2592  1.2934  0.8951  1.2297   5.0154 1.0145      394
   T      103.3865 110.6080 34.2538 33.0143 27.5613 144.0561 1.0606       44
 ess_tail
      244
       15
       17
     2197
       46

Posterior Predictive Samples